Extra 3.2 - Historical Provenance - Application 3: RRG Chat Messages

Identifying instructions from chat messages in the Radiation Response Game.

In this notebook, we explore the performance of classification using the provenance of a data entity instead of its dependencies (as shown here and in the paper). In order to distinguish between the two, we call the former historical provenance and the latter forward provenance. Apart from using the historical provenance, all other steps are the same as the original experiments.

  • Goal: To determine if the provenance network analytics method can identify instructions from the provenance of a chat messages.
  • Classification labels: $\mathcal{L} = \left\{ \textit{instruction}, \textit{other} \right\} $.
  • Training data: 69 chat messages manually categorised by HCI researchers.

Reading data

The RRG dataset based on historical provenance is provided in the rrg/ancestor-graphs.csv file, which contains a table whose rows correspond to individual chat messages in RRG:

  • First column: the identifier of the chat message
  • label: the manual classification of the message (e.g., instruction, information, requests, etc.)
  • The remaining columns provide the provenance network metrics calculated from the historical provenance graph of the message.

Note that in this extra experiment, we use the full (historical) provenance of a message, not limiting how far it goes. Hence, there is no $k$ parameter in this experiment.


In [1]:
import pandas as pd

In [2]:
filepath = "rrg/ancestor-graphs.csv"

In [3]:
df = pd.read_csv(filepath, index_col=0)
df.head()


Out[3]:
label entities agents activities nodes edges diameter assortativity acc acc_e ... mfd_e_a mfd_e_ag mfd_a_e mfd_a_a mfd_a_ag mfd_ag_e mfd_ag_a mfd_ag_ag mfd_der powerlaw_alpha
21 requests 186 7 21 214 469 7 0.012152 0.488348 0.445533 ... 22 19 34 22 19 0 0 0 37 2.924960
20 commissives 183 7 20 210 461 7 0.007546 0.487386 0.446461 ... 22 19 33 22 19 0 0 0 37 2.858642
23 assertives 216 7 23 246 543 7 -0.001550 0.489050 0.447828 ... 26 22 38 26 19 0 0 0 46 2.867888
25 instruction 220 7 24 251 553 7 0.002591 0.489752 0.447110 ... 26 22 38 26 19 0 0 0 46 2.891161
24 instruction 219 7 24 250 551 7 0.002284 0.489859 0.447021 ... 26 22 38 26 19 0 0 0 46 2.928098

5 rows × 23 columns

Labelling data

Since we are only interested in the instruction messages, we categorise the data entity into two sets: instruction and other.

Note: This section is just an example to show the data transformation to be applied on each dataset.


In [4]:
label = lambda l: 'other' if l != 'instruction' else l

In [5]:
df.label = df.label.apply(label).astype('category')
df.head()


Out[5]:
label entities agents activities nodes edges diameter assortativity acc acc_e ... mfd_e_a mfd_e_ag mfd_a_e mfd_a_a mfd_a_ag mfd_ag_e mfd_ag_a mfd_ag_ag mfd_der powerlaw_alpha
21 other 186 7 21 214 469 7 0.012152 0.488348 0.445533 ... 22 19 34 22 19 0 0 0 37 2.924960
20 other 183 7 20 210 461 7 0.007546 0.487386 0.446461 ... 22 19 33 22 19 0 0 0 37 2.858642
23 other 216 7 23 246 543 7 -0.001550 0.489050 0.447828 ... 26 22 38 26 19 0 0 0 46 2.867888
25 instruction 220 7 24 251 553 7 0.002591 0.489752 0.447110 ... 26 22 38 26 19 0 0 0 46 2.891161
24 instruction 219 7 24 250 551 7 0.002284 0.489859 0.447021 ... 26 22 38 26 19 0 0 0 46 2.928098

5 rows × 23 columns

Balancing data

This section explore the balance of the RRG datasets.


In [6]:
# Examine the balance of the dataset
df.label.value_counts()


Out[6]:
other          37
instruction    32
Name: label, dtype: int64

Since both labels have roughly the same number of data points, we decide not to balance the RRG datasets.

Cross validation

We now run the cross validation tests on the datasets using all the features (combined), only the generic network metrics (generic), and only the provenance-specific network metrics (provenance). Please refer to Cross Validation Code.ipynb for the detailed description of the cross validation code.


In [7]:
from analytics import test_classification

In [8]:
results, importances = test_classification(df, n_iterations=1000)


Accuracy: 64.07% ±1.1212 <-- combined
Accuracy: 66.20% ±1.1259 <-- generic
Accuracy: 61.03% ±1.1090 <-- provenance

Results: Compared to the top accuracy achieved using forward provenance, 85%, using historical provenance in this application yield much lower accuracy, 66%. This supports our hypothesis that the forward provenance of a data entity correlates better with its nature/characteristic than its historical provenance (as the forward provenance records how the data entity was used).